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Maize centromeres expand and adopt a uniform size 
in the genetic background of oat 
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Most existing centromeres may have originated as neocentromeres that activated de novo from noncentromeric regions. 
However, the evolutionary path from a neocentromere to a mature centromere has been elusive. Here we analyzed the 
centromeres of nine chromosomes that were transferred from maize into oat as the result of an inter-species cross. 
Centromere size and location were assayed by chromatin immunoprecipitation for the histone variant CENH3, which is 
a defining feature of functional centromeres. Two isolates of maize chromosome 3 proved to contain neocentromeres in 
the sense that they had moved from the original site, whereas the remaining seven centromeres [1, 2, 5, 6, 8, 9, and 10) were 
retained in the same area in both species. In all cases, the CENH3-binding domains were dramatically expanded to 
encompass a larger area in the oat background [-3.6 Mb) than the average centromere size in maize [-1.8 Mb). The 
expansion of maize centromeres appeared to be restricted by the transcription of genes located in regions flanking the 
original centromeres. These results provide evidence that (1) centromere size is regulated; (2) centromere sizes tend to be 
uniform within a species regardless of chromosome size or origin of the centromere; and [3) neocentromeres emerge and 
expand preferentially in gene-poor regions. Our results suggest that centromere size expansion may be a key factor in the 
survival of neocentric chromosomes in natural populations. 

[Supplemental material is available for this article.] 



Centromeres can be stable for hundreds of thousands of years, but 
under rare circumstances have been known to change positions 
along the chromosomes. Examples of centromere repositioning 
have been documented in both plant and animal species as 
revealed by comparative genomics (Han et al. 2009; Rocchi et al. 
2012). An early example involved the comparison of X chromo- 
somes from human and two lemur species (Ventura et al. 2001). 
Gene order is strongly conserved on the three X chromosomes, yet 
the centromeres are in different locations, indicating that the cen- 
tromeres underwent dramatic and yet poorly understood reposi- 
tioning events (Ventura et al. 2001). One way to study centromere 
repositioning is to focus on newly established centromeres known 
as neocentromeres. There are many known neocentromere exam- 
ples in human clinical samples (Voullaire et al. 1993; Marshall et al. 
2008) as well as in different animal and plant species (Williams et al. 
1998; Maggert and Karpen 2001; Nasuda et al. 2005; Ishii et al. 2008; 
Ketel et al. 2009; Topp et al. 2009; Fu et al. 2013). Most newly formed 
neocentromeres lie in moderately repetitive genomic regions inter- 
spersed with single-copy sequences (Marshall et al. 2008), whereas 
nearly all mature centromeres contain long arrays of satellite repeats 
(Henikoff et al. 2001; Jiang et al. 2003). The transition from a neo- 
centromere to a stable mature centromere presumably involves the 
accumulation of repeats over long time frames (Yan et al. 2006; 
Kalitsis and Choo 2012). 

Centromere identity is conferred epigenetically by the pres- 
ence of the specialized histone H3 variant known as CENPA in 
humans (Earnshaw and Rothfield 1985) and CENH3 in plants 
(Talbert et al. 2002). The distribution of CENH3-containing nu- 
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cleosomes within the boundaries of centromeres is not well un- 
derstood, although it appears to be discontinuous and inter- 
spersed with canonical nucleosomes (Blower et al. 2002; Yan et al. 

2008) . Some human neocentromeres and several plant centro- 
meres contain genes embedded as islands within centromeres 
(Saffery et al. 2003; Nagaki et al. 2004; Gong et al. 2012). While 
genes may closely border CENH3 -containing nucleosomes, gene 
transcription is generally incompatible with CENH3 (Ketel et al. 

2009) . Centromeres in higher eukaryotes usually span hundreds 
of kilobases of sequence and often do not appear to have sharp 
edges, at least as determined by chromatin immunoprecipitation 
(ChIP) of CENH3 from complex plant tissues (Yan et al. 2008; 
Gong et al. 2012). The total number of CENH3 nucleosomes is 
positively correlated with genome size (Zhang and Dawe 2012), 
but centromere size does not necessarily correlate with chromosome 
size. For example, in the budding yeast Saccharomyces cerevisiae, 
each of the 16 centromeres contains a single nucleosome (Meluh 
et al. 1998; Henikoff and Henikoff 2012), although the largest 
chromosome (1532 kb) is six times bigger than the smallest 
chromosome (230 kb) (Goffeau et al. 1996). More strikingly, al- 
though the sizes of chicken (Gallus gallus) macrochromosomes 
and microchromosomes are vastly different (Hillier et al. 2004), 
all chicken chromosomes appear to have kinetochores of a similar 
size (Johnston et al. 2010). For instance, the Z chromosome (—75 Mb) 
is 15 times bigger than chromosome 27 (~5 Mb), but the centro- 
meres of both chromosomes have a 30- to 40-kb CENPA-binding 
domain (Shang et al. 2010). 
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Plants are known for the capacity to 
tolerate very wide and even interspecies 
crosses. Two of the most distant plant 
species to ever be crossed are oat (Avena 
sativa, 2n = 6x = 42) and maize (Zea mays, 
2n = 2x = 20), which diverged nearly 
60 million years ago. Most of the maize 
chromosomes are stochastically lost in 
progeny often retaining just one maize 
chromosome in the oat background 
(Kynast et al. 2001). Since the oat genome 
(1 1,300 Mb) is over four times bigger than 
the maize genome (2500 Mb), and total 
(summed) centromere size scales linearly 
with genome size (Zhang and Dawe 
2012), we predicted that maize centro- 
meres would expand in the oat back- 
ground. We mapped the CENH3 -binding 
domains of two maize neocentric chro- 
mosomes and seven normal maize chro- 
mosomes after transfer to the oat back- 
ground. All nine centromeres showed 
a dramatic expansion of roughly twofold, 
principally into regions of low gene den- 
sity. These results illuminate the process 
of centromere reorganization that follows 

wide species crosses. Centromere size variance may be a key factor 
that contributes to chromosome loss following such crosses, and 
centromere expansion may be an important adaptation that al- 
lows new centromeres to stabilize. 



Results 

Confirmation that neoM3 is an isochromosome derived 
from the short arm of maize chromosome 3 

Several maize lines have been used to develop oat-maize chromo- 
some addition lines (oat strains containing one maize chromo- 
some). The first maize line used was a sweet corn hybrid known as 
Seneca 60. One of the Seneca 60 chromosomes identified in oat 
was a fragment of chromosome 3 that contained a neocentromere 
(Topp et al. 2009). This neocentric chromosome, neoM3, was re- 
covered as a derivative from a full Seneca 60 chromosome 3 addi- 
tion line called OMA3.01. Staining of the neoM3 chromosome 
with anti-CENH3 antibodies suggests that it is an isochromosome 
with two identical chromosome arms (the arm ratio is 1.03 ± 0.02, 
n = 20) (Fig. 1A,B). To confirm this, we isolated an 8.7-kb DNA 
segment (m3S8.7) from the distal region on the short arm of maize 
chromosome 3. Fluorescence in situ hybridization (FISH) using 
m3S8.7 as a probe produced a single hybridization signal on the 
short arm of maize chromosome 3 in OMA3.01 (Fig. 1C) but 
generated signals on both chromosomal ends of neoM3 (Fig. IE). 

Mapping the CENH3-binding domain of neoM3 [nCenM3) 

Maize centromeres contain arrays of two intermingled repetitive 
DNA elements, including a 156-bp satellite repeat CentC (Ananiev 
et al. 1998) and a centromeric retrotransposon CRM (Zhong et al. 
2002). The CentC/CRM arrays span several megabases of DNA in 
some maize centromeres (Jin et al. 2004; Ananiev et al. 2009). The 
CENH3 -binding domains of these heavily repetitive centromeres 
generally cannot be delineated by sequencing-based approaches. 




Figure 1. Cytological characterization of the neocentric chromosome neoM3. (A) Immunofluores- 
cence assay of the oat-maize neoM3 addition line using anti-CENH3 antibodies. The arrow points to the 
CENH3 signal on the neoM3 chromosome. (B) The neoM3 chromosome is identified by sequential 
genomic in situ hybridization (GISH) of the same metaphase cell using maize genomic DNA as a probe. 
(C) The two copies of maize chromosome 3 (arrows) in the oat-maize addition line OMA 3.01 are 
detected by FISH using a 8.7-kb DNA probe amplified from the distal region on the short arm. (D) 
Identification of the maize chromosomes in the same metaphase cell as assayed by GISH. (f) FISH 
mapping of the 8.7-kb DNA probe on the neoM3 chromosome. Note: The probe hybridizes to both 
ends of the neoM3 chromosome (arrows). (F) The identification of neoM3 in the same metaphase cell is 
confirmed by GISH. Bars, 10 |xm. 



However, the CentC/CRM arrays account for only a portion of the 
CENH3 -binding domains in several other maize centromeres, and 
in these cases, the CENH3 boundaries can be defined by mapping 
the sequences associated with CENH3 nucleosomes (Wolfgruber 
et al. 2009). 

As a control for all maize chromosomes, we first conducted 
CENH3 ChIP, followed by Illumina sequencing (ChlP-seq) of the 
reference maize inbred line B73. This replicates prior ChIP exper- 
iments on B73 using lower-coverage 454 sequencing (Wolfgruber 
et al. 2009). We obtained a total of 84 million (M) paired sequence 
reads, including 12.9 M reads (one end or both ends of a paired 
read, 7.7% of the 168 M total ends) related to CentC or CRM re- 
peats. We mapped 40 M read pairs to unique positions in the B73 
reference genome (version 2). The CENH3 -binding domain of B73 
chromosome 3 (Cen3) was mapped between positions 99.78 and 
100.76 Mb (chromosome 3 is 232.1 Mb long) (Fig. 2A; Table 1), 
which is in agreement with the Cen3 position mapped previously 
based on a total of 149,756 ChIP-454 sequence reads (Wolfgruber 
et al. 2009). We note that B73 Cen3 may, in fact, be larger than 
~1 Mb, since the assembly is not complete for this centromere. 

We then conducted a ChlP-seq analysis of the neoM3 line. We 
generated 57 M paired reads and mapped 1.49 M reads to maize 
chromosome 3. Significant sequence enrichment was observed at 
positions 78.2-80.3 Mb of maize chromosome 3 (Fig. 2B). The 
read distribution on chromosome 3 showed a cliff-like drop-off 
around position 80.6 Mb, and very few sequence reads were 
mapped beyond 80.6 Mb (Fig. 2B), suggesting that neoM3 was 
broken in this region and the duplicated short arms fused to 
create an isochromosome. As the distribution of CENH3 ChlP-seq 
reads in all other maize chromosomes and those in several other 
species show a bell-shaped distribution (Yan et al. 2008; Gong 
et al. 2012), the complete CENH3-binding domain in nCenM3 
likely includes the 78.2-80.3 Mb region on both arms. Thus, the 
CENH3-binding domain of nCenM3 includes a minimum of 2.4 
Mb (78.2-80.6 Mb) and likely spans 4.8 Mb, which is signifi- 
cantly larger than the mapped size of Cen3 in B73. 
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Figure 2. Mapping of the centromere and neocentromeres on maize chromosome 3. (A) Mapping of ChlP-seq reads from B73 on maize chromosome 
3. The CENH3-binding domain of Cen3, marked by a pink box, was mapped to the region between 99.78 and 1 00.76 Mb. The /-axis shows the number of 
ChlP-seq reads in 10-kb windows along chromosome 3. (£) Mapping of ChlP-seq reads from the neoM3 line on maize chromosome 3. The CENH3- 
binding domain of nCenM3, marked by a green box, was mapped to the region between 78.2 and 80.3 Mb. Background reads were detected throughout 
0 to 80.6 Mb. The /-axis shows the number of ChlP-seq reads in 1 0-kb windows along chromosome 3. Chromosome 3 sequence of maize B73 is used as 
the reference sequence. (C) Mapping of ChlP-seq reads from the OMA3.01 line on maize chromosome 3. The CENH3-binding domain of nCen3, marked 
by a pink box, was mapped in 79.3-83.9 Mb. The /-axis shows the number of ChlP-seq reads in 1 0-kb windows along chromosome 3. (D) Distribution of 
non-TE genes on maize chromosome 3. The /-axis shows the percentage of non-TE genes in 1 0-kb windows. (£) Distribution of TE-related genes on maize 
chromosome 3. The /-axis shows the percentage of TE-related genes in 1 0-kb windows. The vertical pink and green boxes across all panels indicate the 
positions of the neocentromere and original centromere on maize chromosome 3. 



CenJwas repositioned when initially transferred to oat 

A simple comparison of the location of nCenM3 to the position of 
B73 Cen3 would suggest that nCenM3 is far removed from the 
natural centromere location. However, nCenM3 is not derived from 
B73 — it is derived from the oat line OMA3.01, which contains 
chromosome 3 originally derived from Seneca 60. Therefore, we also 
conducted a ChIP analysis of OMA3.01. We generated 212 M paired 
reads and mapped 1.39 M reads to maize chromosome 3. Surpris- 
ingly, we found that the CENH3 -binding domain of OMA3.01 
(nCen3) is also displaced relative to B73, and spans 4.6 Mb between 
positions 79.3 and 83.9 Mb (Fig. 2C; Table 1). The two new cen- 
tromeres, nCen3 and nCenM3, partially overlap in the region of 
79.3-80.6 Mb, suggesting that neoM3 was most likely derived 
from a centromeric misdivision event within nCen3 (Fig. 3). 

We then questioned whether the maize Seneca 60 line natu- 
rally contains a centromere in a different position than B73 using 
an assay that involves immunofluorescence for CENH3, followed 
by FISH using a CentC probe. FISH mapping results showed that 
maize chromosome 3 retained the CentC repeats in the OMA3.1 
oat line (Fig. 4A). Thus, the maize chromosome has maintained 
the DNA sequences from its original centromere. We analyzed 
chromosome 3 in 21 metaphase cells. The signals from CENH3 
and CentC were completely separated on 15 chromosomes (71%) 



(Fig. 4A), partially overlapped on two chromosomes, and com- 
pletely overlapped on four chromosomes. By comparison, among 
the 15 chromosomes 3 analyzed in the maize Seneca 60 line, the 
CENH3 and CentC signals were completely separated from each 
other on only three chromosomes (20%), partially overlapped on 
one chromosome, and completely overlapped on 1 1 chromosomes 
(Fig. 4B). These results show that centromere 3 of Seneca 60 un- 
derwent a repositioning event during the formation of OMA3.01. 
Both of the chromosome 3 centromeres are neocentromeres in the 
formal sense: nCen3 was newly formed upon introduction into 
oat, while nCenM3 occurred secondarily as an outcome of a mis- 
division event that further shifted the position toward the short 
arm (Figs. 2, 3). 



nCen3 and nCenM3 formed in gene desert regions 

We found that the position of nCen3 (79.3-83.9 Mb) represents one 
of the most gene-deficient regions on chromosome 3 (Fig. 2D,E). 
Only 21 of the 4197 nontransposable element (non-TE) genes an- 
notated on chromosome 3 were found in this 4.6-Mb domain. The 
gene density in nCen3 is one gene per 219 kb, compared to one gene 
per 55 kb for the average of chromosome 3. A random sampling of 
4.6-Mb regions from chromosome 3 suggests that there is only 
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Table 1. Sizes of maize centromeres in the native and oat backgrounds 



Centromere position in 



Centromere size in 





Maize (Mb) 


Oat (Mb) 


Maize (kb) b 


Oat (kb) 


Size difference (kb) 


Cenl 


a 


1 31 .68-1 34.99 




3307 




Cenl 


92.70-94.73 


91.21-94.73 


2027 


3513 


1486 


Cen3 


99.78-100.76 


79.34-83.90 


982 


4560 


3578 


Cen5 


102.09-103.99 


101.89-105.42 


1906 


3529 


1623 


Cen6 


a 


47.13-50.70 




3572 




Cen8 


49.43-51 .40 


48.22-52.54 


1969 


3840 c 


1871 


Cen9 


51.42-53.08 


51.43-55.17 


1651 


3742 


2091 


CenlO 


50.28-51.72 


48.30-51.60 


1436 


3307 


1871 



a The CENH3-binding domains of Cenl and Cen6 in the maize background are most likely embedded 
within massive CentC arrays and thus cannot be delineated. 

b Cen2 is the best-sequenced maize centromere, and most of the Cen2 sequences are likely covered by 
the current chromosome 2 pseudomolecule. Other maize centromeres may contain more CentC re- 
peats than Cen2. Since the CentC repeats may not be included in the pseudomolecules, the sizes of 
these centromeres are underestimated. 

c The deletion region in Cen8, which was marked in Figure 8, was not included in calculation. 



a 2.7% chance of selecting a 4.6-Mb region containing <21 non-TE 
genes. 

Similarly, the position of nCenM3 (78.2-80.3 Mb) also repre- 
sents a gene-deficient region. Only nine non-TE genes were found 
in this 2.1 -Mb domain, representing a gene density of one gene per 
233 kb of DNA. 

Transcription of genes within the neocentromeres 

CENH3 ChlP-seq reads were distributed unevenly within nCenM3 
and nCen3, resulting in alternating subdomains enriched or de- 
pleted for CENH3 (Fig. 5). The CENH3-depleted regions most likely 
contain H3 nucleosomes, as has been demonstrated in rice cen- 
tromeres. Active genes were detected in the CENH3 -depleted re- 
gions in several rice centromeres (Yan et al. 2008). 

We conducted RNA-seq in OMA3.01 and neoM3 (two bi- 
ological replicates; see Methods) to examine the transcription of 21 
and nine genes annotated within the two neocentromeres. Only 
nine of the 21 genes in nCen3 showed transcription in OMA3.01 
(FPKM > 1) (Supplemental Table SI). All nine genes were located in 
CENH3-depleted subdomains (Fig. 5; Supplemental Fig. S3). Sim- 
ilarly, we detected very low amounts of RNA-seq reads (FPKM = 
0-12) for six of the nine genes in nCenM3 (Supplemental Table SI). 
Transcription of the remaining three genes was detected in both 
nCenM3 and nCen3 (Fig. 5). The first two genes, G959 and G576, 
were associated with CENH3-depleted subdomains. The third gene 
(G311) is unusually large (42.8 kb) and consists of seven small 
exons that together make up the 1.2-kb coding sequence (Sup- 
plemental Fig. SI; Fig. 6). Interestingly, this gene contains a small 
CENH3 subdomain that includes 4175 bp of the first intron, 
20,110 bp of the second intron, and a 48-bp exon located in the 
middle of these two introns (Supplemental Fig. SI). The enrichment 
of CENH3 within this region was confirmed by ChlP-PCR (see 
Methods; Supplemental Fig. S2). In summary, only 12 expressed 
genes were found within the two neocentromeres, and all were lo- 
cated within subdomains that lack CENH3, except for a portion of 
gene G31 1 in nCenM3. 

Alteration of gene expression due to neocentromere activation 

We were interested in whether the transcription of the cen- 
tromeric genes was altered due to neocentromere activation. 



Because both neoM3 (Fig. IB) and 
OMA3.01 (Fig. ID) lines contain two 
copies of the chromosomal segment 
spanning 0-80.6 Mb, the transcription 
levels of genes within this chromo- 
somal segment can be directly com- 
pared. Three active genes (G959, G576, 
and G311) are associated with both 
lines (Fig. 5). Gene G959 is located 
within nCenM3 but outside of nCen3. 
This gene showed a similar level of 
transcription in the two lines (P = 0.22, 
107 FPKM in neoM3 and 75.6 FPKM in 
OMA3.01). Gene G576 is also located 
within nCenM3 but is outside of nCen3, 
but it showed a higher level of tran- 
scription in OMA3.01 (341.4 FPKM) 
than in neoM3 (118.5 FPKM, P = 0). Gene 
G31 1 is located within both nCenM3 and 
nCen3. The amount of G31 1 transcript was 
significantly higher in neoM3 (410.8 FPKM) than in OMA3.01 
(243.9 FPKM, P= 1.1 X 10" 10 ). 

We also conducted quantitative real-time PCR (qPCR) to 
confirm the differential expression. The maize specificity of each 
primer was examined using oat, maize, and oat-maize addition 
lines (Fig. 6). The qPCR results were consistent with the RNA-seq 
data: We did not detect a significant difference for the amounts of 
G959 transcript in the two lines. However, the amount of G576 
transcript was significantly lower in neoM3 than in OMA3.01, and 
the amount of G311 transcript was significantly higher in neoM3 
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Figure 3. A diagrammatic illustration of reposition and expansion of 
maize Cen3 in the genetic background of oat. (A) Cen3 repositioned to 
a short-arm domain that is —16 Mb away from its original location, 
resulting in a neocentromere nCen3. nCen3 has also significantly ex- 
panded compared to Cen3. (B) A misdivision occurred in nCen3. The red 
arrow points to the approximate position of the misdivision. The red bar 
represents the centromeric DNA (Cent C and CRM repeats) associated 
with the original Cen3. (C) The short arm derived from the misdivision 
formed an isochromosome. (D) The centromere of the original iso- 
chromosome expanded, resulting in the current version of nCenM3. 
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Figure 4. Locations of CENH3 and CentC on maize chromosome 3. (A) 
Locations of CENH3 (green) and CentC (red) on chromosome 3 in 
OMA3.01. The CENH3 and CentC signals on maize chromosome 3 are 
exemplified in the large square. Note: The CENH3 signals are shifted away 
from the CentC signals toward the short-arm direction. (B) Locations of 
CENH3 (green) and CentC (red) on chromosome 3 in Seneca 60. The 
CENH3 and CentC signals on one chromosome 3, which are completely 
overlapped, are shown in the large square. Arrows point to the FISH signals 
derived from the 8.7-kb probe associated with the short arm of chromo- 
some 3. Bars, 10 jjum. 



than in OMA3.01 (Supplemental Fig. S4). These data suggest that 
neocentromere formation can alter gene expression but that the 
effects are not dramatic. 



RNA-seq from leaf tissues (Li et al. 2010). We hypothesize that the 
long transcribed domain associated with this gene may prevent the 
expansion of Cen2 in the long-arm direction. 

We also sequenced the ChlPed DNAs from six other oat-B73 
chromosome addition lines. Significant expansion was observed in 
all six centromeres in the oat background (Fig. 8; Table 1). The 
CENH3 -binding domains of Cenl and Cen6 cannot be delineated 
in B73 because these two centromeres contain large amounts of 
CentC repeats (Albert et al. 2010); thus, the CENH3-binding do- 
mains are presumably contained entirely within the CentC repeat 
arrays. In the oat background, however, CENH3 -binding was de- 
tected in the region between 131.68 and 134.99 Mb in Cenl, and 
between 47.13 and 50.70 Mb in Cen6, respectively (Fig. 8). Thus, 
these two centromeres expanded from the CentC repeat arrays 
into the flanking regions that can be delineated by ChlP-seq. 

Cen5, Cen8, Cen9, and CenlO of B73 contain few CentC re- 
peats (Albert et al. 2010). Their sizes in maize ranged from 1.4 Mb 
to 1.9 Mb, whereas in oat they were roughly two times larger, 
ranging from 3.3 Mb (Cenl and CenlO) to 3.8 Mb {Ceng) (Fig. 8; 
Table 1). The expansion of CenS and Cen8 was bidirectional. In 
contrast, the expansion of Cen9 and CenlO was exclusively in the 
long arm and short arm, respectively (Fig. 8). These results further 
support the assertion that the direction of expansion is not ran- 
dom and is likely restricted by the presence of actively transcribed 
genes. 

We also conducted genomic in situ hybridization (GISH) 
using oat genomic DNA as a probe to test the possibility that oat 
sequences may have invaded the maize centromeres. Unambiguous 
hybridization signals were only detected in the telomeric regions 
of maize chromosome 2 in line OMA2.51 (six different oat-maize 
addition lines were assayed) (Supplemental Fig. S5). There was 
no evidence of oat sequences in any of the introduced maize 
centromeres. 



Expansion of maize centromeres in the genetic background 
of oat 

Comparison of the sizes of the CENH3 -binding domains between 
Cen3 in B73 (0.98 Mb) and nCen3 and nCenM3 in oat (4.6 Mb and 
4.8 Mb, respectively) suggests that the maize centromeres ex- 
panded significantly in the oat background. To test whether this is 
a general phenomenon, we conducted CENH3 ChlP-seq in seven 
other oat-maize chromosome addition lines developed from maize 
inbred B73 (maize chromosomes 1, 2, 5, 6, 8, 9, and 10) (Rines et al. 
2009). Since B73 has been fully sequenced (Schnable et al. 2009), 
the sizes of individual B73 centromeres in the maize and oat 
backgrounds can be directly compared. 

Cen2 is the best sequenced B73 centromere because it con- 
tains the least amount of CentC repeats (Wolfgruber et al. 2009). We 
mapped B73 Cen2 to the region between 92.70 and 94.73 Mb that 
spans 2.03 Mb, similar to prior data (Fig. 7 A; Table 1; Wolfgruber 
et al. 2009). However, Cen2 of B73 in the oat background mapped 
to the region between 91.21 and 94.73 Mb, covering 3.51 Mb of 
the chromosome (Fig. 7B; Table 1). Interestingly, the expansion 
of Cen2 occurred exclusively in the short-arm direction. Mapping 
of all annotated non-TE genes in chromosome 2 revealed that 
the expanded region (91.21-92.70 Mb) represents one of the 
most gene-deficient regions on chromosome 2. Only nine of the 
4766 genes annotated on chromosome 2 were mapped within this 
1.49-Mb region (P = 0.056). We found a large gene (58.2 kb) located 
23 kb away from the boundary of the CENH3 domain on the 
long arm (Fig. 7C). Expression of this gene has been confirmed by 



Discussion 

Centromere expansion: A requirement for survival 
of evolutionary new centromeres? 

We demonstrate in nine cases that centromeres transferred from 
maize into oat increase in size, including two cases where the maize 
centromeres moved to different locations. Oat and maize are effec- 
tively cross-incompatible, and plants can only be recovered after 
embryo culture. Like many similar crosses (Laurie and Bennett 
1988), when embryos survive, they are usually haploid for one of 
the contributing genomes (in this case, oat). Centromere in- 
compatibility appears to be the primary cause for genome elimi- 
nation in hybrids (Sanei et al. 2011). An analysis of diploidized 
plants from the oat-maize hybrids revealed that some progeny 
retain maize chromosomes that have presumably undergone a 
process of centromere inactivation followed by re-assembly. The 
process of recovering a maize chromosome in oat is roughly 
equivalent to a whole-chromosome transfer event followed by 
strong selection for stable chromosome transmission. 

A prior analysis of centromere size in 10 grass species dem- 
onstrated that centromere size is correlated with genome size such 
that the "total centromere area" is equally distributed among the 
available chromosomes (Zhang and Dawe 2012). A simple tetra- 
ploidization event is not expected to affect centromere size 
because the number of centromeres increases accordingly. The 
oat-maize comparison is different: The oat genome is four times 
bigger than maize, but the number of centromeres is only doubled 
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Figure 5. Mapping of CENH3 binding and gene expression in the neocentromeres. (A) The positions of non-TE genes annotated in nCenM3 and nCen3, 
which overlap in the 79.3-80.3 Mb region. (B) Distribution of ChlP-seq reads in nCenM3. Each black bar represents the number of ChlP-seq reads (/-axis) in 
a 1 -kb window. (C) Gene expression value (FPKM, /-axis) based on RNA-seq in the neoM3 line. (D) Distribution of ChlP-seq reads in nCen3. Each black bar 
represents the number of ChlP-seq reads (/-axis) in a 1 -kb window. (£) Gene expression value (FPKM, /-axis) based on RNA-seq in the OMA3.01 line. The 
green blocks indicate subdomains significantly enriched with CENH3, which are interspersed with subdomains that were not enriched with CENH3 (blocks 
with no color). 



(42 versus 20). Therefore, we anticipated that any given oat cen- 
tromere would be roughly twice as large as a maize centromere. Data 
shown here demonstrate that maize centromeres transferred into 
oat tend to stabilize at a size of —3.6 Mb, which is, indeed, close to 
twice the size of the average maize centromere, which appears to be 
closer to 1.8 Mb. The two unequivocal neocentromeres described 
here, nCen3 and nCenM3, show similar sizes to all other expanded 
maize centromeres. This observation, taken in the context of the 
general observation that centromere sizes are generally uniform 
within species and do not correlate with chromosome size Qohnston 
et al. 2010; Henikoff and Henikoff 2012; Zhang and Dawe 2012), 
suggests that each species maintains a centromere size equilibrium. 
A consistent centromere size may be favorable for chromosome 
alignment, as it would allow each chromosome to have an equal 
likelihood of being attached by a similar number of microtubules, 
which may be essential for random segregation and distribution of 
chromosomes during meiosis. 

It seems highly unlikely that any newly formed centromere 
will spontaneously form at the optimum size. In fact, most human 
neocentromeres are smaller than the average native centromere 
(Irvine et al. 2004; Marshall et al. 2008). Therefore, we believe that 
the stabilization of neocentric chromosomes in natural pop- 
ulations will depend on whether the CENH3 domain of the neo- 
centromere can expand into the flanking regions. While OMA3.01 
was reported to be stable upon discovery (Muehlbauer et al. 2000), 
the original misdivision derived from it (neoM3) was mitotically 
unstable, and stable lines were ultimately selected and studied 
(Topp et al. 2009). Similarly, a highly unstable maize chromosome 
known as Dp3a was recently shown to have a neocentromere (Fu 



et al. 2013). The instability of Dp3a is likely associated with the fact 
that the neocentromere contains only a 350-kb CENH3-binding 
domain and is located in a gene dense area. 

Transcription and neocentromere establishment 

Centromeric chromatin is generally incompatible with gene 
transcription. Insertion of a marker gene in the centromeres of 
Schizosaccharomyces pombe chromosomes results in complete si- 
lencing of the gene (Allshire et al. 1995). Similarly, neocentromeres 
in multiple species generally form in gene-poor regions (Lomiento 
et al. 2008; Alonso et al. 2010; Shang et al. 2013); when they do form 
over genie areas, the affected genes are suppressed or silenced (Ishii 
et al. 2008; Ketel et al. 2009; Shang et al. 2013). Why does centro- 
meric chromatin avoid actively transcribed genes? CENH3 micro- 
somes cannot be modified by the histone modification pathways 
specific to the canonical H3 nucleosomes. CENH3 nucleosomes are 
also more compact and conformationally more rigid than H3 nu- 
cleosomes (Black et al. 2004). Thus, CENH3 chromatin is probably 
far less compatible with regulated transcription. In addition, gene 
transcription may actively evict CENH3 nucleosomes during 
periods of development when they cannot be readily replaced 
(Gassmann et al. 2012). Our data strongly support this extensive 
literature by demonstrating that 12 transcribed genes found in 
nCenM3 and nCen3 were all located within subdomains depleted of 
CENH3. The only exception was an internal domain of a long gene 
that contains a single 48-bp exon. 

We demonstrate that both nCen3 and nCenM3 represent the 
most gene-deficient regions on chromosome 3 (Fig. 2). In addition, 
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millet chromosomes appeared to adapt 
the oat background in these hybrids. Since 
the millet genome (2450 Mb) (Martel et al. 
1997) is significantly smaller than the oat 
genome (11,300 Mb), our data suggest 
that the millet centromeres expanded to 
adapt to the overall larger genome envi- 
ronment. Interestingly, the centromeres 
of millet chromosomes contain large 
amounts of satellite repeats (Ishii et al. 
2013) and may lack active genes, which 
is favorable for centromere expansion. In 
wide crosses between a large genome spe- 
cies (such as oat and wheat) and a small 
genome species (such as maize, pearl mil- 
let, or sorghum), chromosomes from the 
small genome parent were often elimi- 
nated in early embryogenesis (Laurie and 
Bennett 1988; Ishii et al. 2013). We pro- 
pose that failure of centromere expansion 
of chromosomes derived from the small 
genome parent may be the key factor in 
chromosome elimination. 

Methods 
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Figure 6. RT-PCR analysis of three active genes located in the neocentromeres. (Lane 7) Primer 31 1 - 
92 amplified an 840-bp fragment from gene G31 7 in Seneca 60 (maize), Sun II (oat), OMA3.01 line, and 
neoM3 line. This fragment spans exon 1 to exon 5 of G3 7 7 . (Lane 2) Primer G31 1 -1 5 amplified a 707-bp 
fragmentfrom gene G3 7 7 in all four lines. This fragment spans exon 2 to exon 7 of G3 7 7. (Lanes 3,4,5,6) 
Four different maize-specific primers were designed to amplify different parts of G3 7 7 in the two oat- 
maize chromosomal addition lines. (Lanes 7,8) Two different maize-specific primers were designed to 
amplify gene G576 in the two oat-maize chromosomal addition lines. Note: Both primers amplified two 
bands in Seneca 60, suggesting that G576 has a homologous gene of G576 in a different maize chro- 
mosome. Both primers, however, amplified a single band in the two addition lines. (Lanes 9,10) Two 
different maize-specific primers were designed to amplify gene G959 in two oat-maize chromosomal 
addition lines. All primer sequences are provided in Supplemental Table S3. 



the expanded CENH3 domains on seven maize centromeres are 
also largely gene-deficient (Figs. 7, 8). Interestingly, a large and 
active gene was found to be near the boundaries of the expanded 
Cen2 (Fig. 7C) and Cen5 (Fig. 8). We postulate that the transcription 
of these large genes impedes the expansion of CENH3 domains, 
similar to barriers that block the spreading of heterochromatin 
(Noma et al. 2006; Scott et al. 2006). Some maize chromosomes were 
especially difficult to recover in oat-maize hybrids (Kynast et al. 
2001; Rines et al. 2009), suggesting that the expansion of centro- 
meres may have failed. For example, maize chromosome 3 was not 
recovered from the oat X B73 and oat X Mo 1 7 hybrids. In addition, 
the centromere of maize chromosome 3 recovered from the oat X 
Seneca 60 hybrid moved to a new position (Fig. 2C). Thus, active 
genes flanking Cen3 (Fig. 2D) may prohibit the expansion of this 
centromere, resulting in the loss of this chromosome or reposi- 
tioning of Cen3 in the oat background. 

Ishii et al. (2013) recently conducted wide crosses between oat 
and pearl millet (Pennisetum glaucum, 2n = 2x = 14) and developed 
true oat-millet hybrids that contain two complete haploid sets of 
all 21 oat and seven millet chromosomes (Ishii et al. 2013). The 



Plant materials 

Oat-maize chromosome addition line 
OMA3.01 contains maize chromosome 3 
derived from maize hybrid Seneca 60 
(Kynast et al. 2001). The oat-maize neoM3 
monosomic addition line was derived from 
OMA3.01 (Topp et al. 2009). Oat-maize 
chromosome addition lines OMA1.36, 
OMA2.51, OMA5.60, OMA6.34, OMA8.05, 
OMA9.41, and OMA10.26 contain maize 
chromosomes 1, 2, 5, 6, 8, 9, and 10, re- 
spectively, from maize inbred B73 (Rines 
et al. 2009). OMA3.01, OMA2.51, neoM3, 
and maize lines Seneca 60 and B73 were 
used in ChlP-seq and FISH experiments. 
Seneca 60, B73, and oat cultivar Sun II 
were used in PCR and qPCR experiments. All plants were grown in 
greenhouses. Leaf tissues and root tips were collected from the 
plants for experiments. 

FISH, GISH, and chromosomal immunoassay 

FISH, GISH, and immunoassays on chromosomes were performed 
according to published protocols Qiang et al. 1995; Jin et al. 2004). 
In the GISH procedure, oat genomic DNA was used as a probe, 
and unlabeled maize genomic DNA was used as a blocker. For 
FISH identification of maize chromosome 3, we isolated an 8.7-kb 
DNA segment from the maize bacterial artificial chromosome 
ZMMBBb0013L21. This fragment is located in the distal region of 
the short arm of maize chromosome 3 and is named m3S8.7. 
Primers were designed based on BAC sequence and then were used 
in PCR (Supplemental Table S2). PCR condictions were 94°C for 
3 min, followed by 35 cycles of 95°C for 30 sec, 55°C for 90 sec, and 
72°C for 60 sec and ended by a 4-min extension at 72°C. PCR 
products were recovered by a Gel Extraction Kit (Qiagen, catalog no. 
28704). The 8.7-kb DNA segment from 10 PCR products were mixed 
and labeled as a FISH probe. 
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Figure 7. Expansion of Cen2 of B73 in the genetic background of oat. (A) The CENH3-binding domain (green box) of Cen2 in B73. The ChlP-seq read 
number (/-axis) was calculated in 10-kb windows per million reads. (B) The CENH3-binding domain (green box) of Cen2 in oat-maize chromosomal 
addition line OMA2.51 . (C) The length (kb) of non-TE genes in 20-kb sliding windows (step = 1 0 kb). The red arrow points to a 58.2-kb transcribed gene 
located 23 kb away from the CENH3-binding domain. (D) Distribution of non-TE genes along maize chromosome 2. Details of gene distribution within the 
88-98 Mb region are exemplified in C. 



ChIP, ChlP-seq, and RNA-seq 

A CENH3 antibody developed in rice recognizes both maize and 
oat CENH3 (Jin et al. 2004). This antibody was used for all im- 
munoassay and ChIP experiments. ChIP was conducted following 
a published protocol (Nagaki et al. 2003). Normal rabbit serum was 
used in a mock treatment as a negative control. ChlPed DNA was 
then used for ChlP-seq library construction according to the pro- 
tocol provided by Illumina, including repairing the ends of DNA 
fragments, poly(A) tailing of the 3' ends, ligation of paired-end 
adapters, fractionation of 150-300 bp adapter-ligated DNA using 
a 2% agarose gel, and enrichment of sized adapter-modified DNA 
fragments by PCR. The enriched DNA sample was sequenced using 
Illumina Genome Analyzer II or HiSeq platforms. 

RNA-seq was performed in both neoM3 and OMA3.01 lines. 
Two biological replicates of young leaf tissues harvested from both 
neoM3 and OMA3.01 were used for RNA-seq analysis. Total RNA 
was extracted using an RNeasy plant kit (Qiagen, catalog no. 74904). 
Approximately 40 fxg total RNA was converted to cDNA using the 
mRNA-seq kit from Illumina. RNA-seq libraries were constructed 
using a barcode method and sequenced using an Illumina Genome 
Analyzer II platform. 

ChlP-seq reads were mapped to the reference genome of maize 
B73 version 2 using the MAQ alignment program (Li et al. 2008). We 
allowed 1-bp mismatch between each sequence read and the refer- 
ence genome, then kept only reads that mapped to a unique posi- 
tion in the reference genome for further analysis. We used TopHat 
(Trapnell et al. 2009) to map sequence reads from RNA-seq to the 
same reference genome and employed Cufflinks (Trapnell et al. 



2010) to measure the difference of gene expression level between 
OMA3.01 and neoM3. 

Mapping and identification of CENH3 -enriched regions fol- 
lowed published protocols with only minor modifications (Yan 
et al. 2008). We considered the genomic position of the starting 
nucleotide of a unique read as a uniquely mappable region and 
then calculated the number of unique read pairs per base pair 
mappable region in 1-kb windows. We used these adjusted read 
numbers to identify the enriched region of CENH3. We required 
that the enriched window be P < 1 X 10" 5 and that the CENH3 
region includes at least three continuous enriched windows. 



RT-PCR, qPCR, and ChlP-PCR 

We used RT-PCR to examine the expression of three genes, G959, 
G576, and G311, which are present in the neocentromere. Primers 
were designed to have a length between 20 and 24 bp, with the 
annealing temperature of 55°C-60°C (Supplemental Table S3). 
RNA was isolated from leaf tissues collected from Seneca 60, Sun II, 
OMA3.01, and neoM3. RT-PCR was conducted following a pub- 
lished protocol (Yan et al. 2005). We then conducted qPCR to 
quantify the transcript level of these three genes in the neoM3 and 
OMA 3.01 lines. Gene G311 is present in both maize and oat. 
Therefore, the RT-PCR products from Seneca 60 and Sun II of two 
primers, G311-92 and G311-15, were sequenced to develop maize- 
specific primers. Maize-specific primers were designed based on 
the divergence of cDNA sequences between Sun II and Seneca 60 
(Supplemental Table S3). Only maize-specific primers were then 
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Figure 8. Expansion of Cen 7, Cen5, Cen6, Cen8, Cen9, and Cen W of B73 in the genetic background of oat. The top panel of each centromere shows the 
ChlP-seq read distribution in B73. CENH3 binding was detected in Cen5, Cen8, Cen9, and CenlO, but not in Cenl and Cen6. The middle panel of each 
centromere shows ChlP-seq read distribution in the oat background. The ChlP-seq read number (/-axis) in both panels was calculated in 1 0-kb windows 
per million reads. The bottom panel of each centromere shows the length (kb) of non-TE genes in 20-kb sliding windows (step = 1 0 kb). The red arrow in 
Cen5 points to a 1 36.8-kb transcribed gene flanking the long-arm boundary of the CENH3-binding domain. The horizontal red bar (indicated by a red 
arrow) in Cen8 is a region that completely lacks ChlP-seq sequence reads. A deletion may have occurred in this region during the transfer of this maize 
chromosome into oat. The expansion of Cen8 surpassed this deletion. 



used in qRT-PCR analysis. PCR reactions were carried out using 
the DyNAmo SYBR Green qPCR kit (Thermo Scientific) and run at 
95°C for 5 min, followed by 45 cycles of 95°C for 10 sec, 60°C for 
20 sec, and 72°C for 30 sec. The glyceraldehyde-3 -phosphate de- 
hydrogenase gene of oat was used as an internal reference as pre- 
viously described (Jarosova and Kundu 2010). For each gene, the 
relative threshold cycle number was normalized over the internal 
control as previously described (Yan et al. 2005). 

We conducted ChlP-PCR to verify the relative enrichment of 
three genes in the CENH3 -bound fraction over the mock control 
(Supplemental Table S3). A primer (NEG79) designed from a region 
of chromosome 3, located outside of the neocentromere, was used 
as a negative control. We calculated the difference in the PCR 
threshold cycle number to determine the relative enrichment of 
each amplicon as described (Yan et al. 2005) 

Data access 

The ChlP-seq reads associated with maize chromosomes from all 
nine oat-maize chromosome addition lines and the RNA-seq data 
sets have been submitted to the NCBI Gene Expression Omnibus 
(GEO; http://www.ncbi.nlm.nih.gov/geo/) under accession num- 
ber GSE47342. 
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